Asymptotics for High Dimension, Low Sample Size data and Analysis of Data on Manifolds

نویسندگان

  • Sungkyu Jung
  • Yufeng Liu
  • Jan Hannig
چکیده

SUNGKYU JUNG: Asymptotics for High Dimension, Low Sample Size data and Analysis of Data on Manifolds. (Under the direction of Dr. J. S. Marron.) The dissertation consists of two research topics regarding modern non-standard data analytic situations. In particular, data under the High Dimension, Low Sample Size (HDLSS) situation and data lying on manifolds are analyzed. These situations are related to the statistical image and shape analysis. The first topic is an asymptotic study of the high dimensional covariance matrix. In particular, the behavior of eigenvalues and eigenvectors of the covariance matrix is analyzed, which is closely related to the method of Principal Component Analysis (PCA). The asymptotic behavior of the Principal Component (PC) directions, when the dimension tends to infinity with the sample size fixed, is investigated. We have found mathematical conditions which characterize the consistency and the strong inconsistency of the empirical PC direction vectors. Moreover, the conditions where the empirical PC direction vectors are neither consistent nor strongly inconsistent are revealed, and the limiting distributions of the angle formed by the empirical PC direction and the population counterpart are presented. These findings help to understand the use of PCA in the HDLSS context, which is justified when the conditions for the consistency occur. The second part of the dissertation studies data analysis methods for data lying in curved manifolds that are the features from shapes or images. A common goal in statistical shape analysis is to understand variation of shapes. As a means of dimension reduction and visualization, there is a need to develop PCA-like methods for manifold data. We propose flexible extensions of PCA to manifold data: Principal Arc Analysis and Analysis of Principal Nested Spheres. The methods are implemented to two important types of manifolds. The sample space of the medial representation of shapes, frequently used in image analysis to parameterize the shape of human organs, naturally forms curved manifolds, which we characterize as direct product manifolds. Another type of manifolds we consider is the landmark-based

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Geometric representation of high dimension, low sample size data

High dimension, low sample size data are emerging in various areas of science. We find a common structure underlying many such data sets by using a non-standard type of asymptotics: the dimension tends to 1 while the sample size is fixed. Our analysis shows a tendency for the data to lie deterministically at the vertices of a regular simplex. Essentially all the randomness in the data appears o...

متن کامل

Decomposability of high-dimensional diversity measures: Quasi-U-statistics, martingales and nonstandard asymptotics

In complex diversity analysis, specially arising in genetics, genomics, ecology and other high-dimensional (and sometimes low sample size) data models, typically subgroup-decomposability (analogous to ANOVA decomposability) arises. In groupdivergence of diversity measures in a high-dimension low sample size scenario, it is shown that Hamming distance-type statistics lead to a general class of q...

متن کامل

Asymptotic Properties of Distance-Weighted Discrimination

While Distance-Weighted Discrimination (DWD) is an appealing approach to classification in high dimensions, it was designed for balanced data sets. In the case of unequal costs, biased sampling or unbalanced data, there are major improvements available, using appropriately weighted versions of DWD. A major contribution of this paper is the development of optimal weighting schemes for various no...

متن کامل

Largest Eigenvalue Estimation for High-Dimension, Low-Sample-Size Data and its Application

A common feature of high-dimensional data is the data dimension is high, however, the sample size is relatively low. We call such data HDLSS data. In this paper, we study HDLSS asymptotics when the data dimension is high while the sample size is fixed. We first introduce two eigenvalue estimation methods: the noise-reduction (NR) methodology and the cross-data-matrix (CDM) methodology. We show ...

متن کامل

Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA

In High Dimension, Low Sample Size (HDLSS) data situations, where the dimension d is much larger than the sample size n, principal component analysis (PCA) plays an important role in statistical analysis. Under which conditions does the sample PCA well reflect the population covariance structure? We answer this question in a relevant asymptotic context where d grows and n is fixed, under a gene...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011